A Distributed Data Storage Scheme for Sensor 

Networks 



Abhishek Parakh and Subhash Kak 

Computer Science Department 
Oklahoma State University, Stillwater OK 74075 
{parakh, subhashk}@cs . okstate . edu 



Abstract. We present a data storage scheme for sensor networks that 
achieves the targets of encryption and distributed storage simultane- 
ously. We partition the data to be stored into numerous pieces such that 
at least a specific number of them have to be brought together to recre- 
ate the data. The procedure for creation of partitions does not use any 
encryption key and the pieces are implicitly secure. These pieces are then 
distributed over random sensors for storage. Capture or malfunction of 
one or more (less than a threshold number of sensors) does not compro- 
mise the data. The scheme provides protection against compromise of 
data in specific sensors due to physical capture or malfunction. 
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1 Introduction 

Sensors may deployed in hostile environment where they may be prone to dan- 
gers ranging from environmental hazards to physical capture. Under such cir- 
cumstances the data and encryption keys stored on these sensors is vulnerable 
to compromise. 

A number of techniques have been proposed for secure communication be- 
tween sensors using key management and data encryption [1, 2, 3, 4]. Some 
techniques aim at secure routing [5, 6], intrusion detection [7] and others de- 
scribe a mechanism for moving sensitive data around in the network from time 
to time [8]. A few researchers propose distributed data storage [15, 16, 17] but 
do not adequately address the question of security or use explicit encryption 
techniques to secure data which leaves the question of secure storage of encryp- 
tion/decryption keys unanswered. 

To go beyond the present approaches, one may incorporate further secu- 
rity within the system by using another layer that increases the space that the 
intruder must search in order to break a cipher [9, 10]. Here, we propose an 
implicitly secure data partitioning scheme whose security is distributed amongst 
many sensors. In contrast to hash-based distributed security models for wireless 
sensor networks [18, 19], we consider a more general method of data partition- 
ing. In this approach, stored data is partitioned into two or more pieces and 
stored at randomly chosen sensors on the network. In scenarios where one or 
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more pieces may be at the danger of being lost or inaccessible due to sensor 
failure or capture, one may employ schemes that can recreate the data from a 
subset of original pieces. 

A number of schemes have been proposed in the communications context for 
splitting and sharing of decryption keys [11, 20]. These schemes fall under the 
category of "secret sharing schemes", where the decryption key is considered 
to be a secret. Motivated by the need to have an analog of the case where 
several officers must simultaneously use their keys before a bank vault or a safe 
deposit box can be opened, these schemes do not consider the requirement of 
data protection for a single party. Further, in any secret sharing scheme it is 
assumed that the encrypted data is stored in a secure place and that none of it 
can be compromised without the decryption key. 

In this paper, we protect data by distributing its parts over various sensors. 
The idea of making these partitions is a generalization of the use of 3 or 9 roots 
of a number in a cubic transformation [12]. The scheme we present is simple and 
easily implementable. 

We would like to stress that the presented scheme is different from Shamir's 
secret sharing scheme which takes the advantage of polynomial interpolation. 
Further, Shamir's scheme maps the secret as points on the y-axis,, whereas the 
scheme proposed in this paper maps the secret as points on the x-axis, as roots 
of a polynomial. 

2 Proposed Data Partitioning Scheme 

By the fundamental theorem of algebra, every equation of k th degree has k roots. 
We use this fact to partition data into k partitions such that each of the partition 
is stored on a different sensor. No explicit encryption of data is required to secure 
each partition. The partitions in themselves do not reveal any information and 
hence are implicitly secure. Only when all the partitions are brought together is 
the data revealed. 

Consider an equation of degree k 

x k + ak-xx^ 1 + a k - 2 x h ~ 2 + ... + a x x + a = (1) 

Equation 1 has k roots denoted by {ri, r2, r k } C {set of complex numbers} 
and can be rewritten as 

(x- ri )(x-r 2 )...(x-r k )=0 (2) 

In cryptography, it is more convenient to use the finite field Z p where p is a large 
prime. If we replace a in (1) with the data d G Z p that we wish to partition 
then, 

fe-i 

x k + ^2 a k -ix k ^ 1 + d = mod p (3) 

i=l 
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where < dj < p — 1 and < d < p — 1. (Note that one may alternatively use 
— d in (3) instead of d.) This may be rewritten as 

k 

\\{x - Ti) = modp (4) 

where 1 < < p — 1. The roots, r^, are the partitions. It is clear that the term 
d in (3) is independent of variable x and therefore 

k 

= d mod p (5) 

i=i 

If we allow the coefficients in (3) to take values a\ = a-i = ... = a,k-i = 0, 
then (3) will have k roots only if GCD(p — 1, k) ^ 1 and 36 € Z p such that <i 
is the k th power of b. One simple way to chose such a p would be to choose a 
prime of the form (k ■ s + 1), where s e N. However, such a choice would not 
provide good security because knowledge of the number of roots and one of the 
partitions would be sufficient to recreate the original data by computing the k th 
power of that partition. Furthermore, not all values of d will have a k th root and 
hence one cannot use any arbitrary integer, which would typically be required. 
Therefore, one of the restrictions on choosing the coefficients is that not all of 
them are simultaneously zero. 

For example, if the data needs to be divided into two parts then an equation 
of second degree is chosen and the roots computed. If we represent this general 
equation by 

x 2 + a\x + d = mod p (6) 

then the two roots can be calculated by solving the following equation modulo 
P, 

x = = *±y@EE (7 ) 

which has a solution in Z p only if the square root \Ja\ — Ad exists modulo p. If 
the square root does not exist then a different value of a\ needs to be chosen. We 
present a practical way of choosing the coefficients below. However, this brings 
out the second restriction on the coefficients, i.e. they should be so chosen such 
that a solution to the equation exists in Z p . 

Theorem 1. // the coefficients ai, 1 < i < k — 1 in equation (3) are not all 
simultaneously zero, are chosen randomly and uniformly from the field, then the 
knowledge of any k — 1 roots of the equation, such that equation (4) holds, does 
not provide any information about the value of d with a probability greater than 
that of a random guess of 1/p. 

Proof. Given a specific d, the coefficients in (3) can be chosen to satisfy (4) in 
Z p in the following manner. Choose at randomly and uniformly from the field 
k— 1 random roots n, r 2 , rk-i- Then k th root can be computed by solving 
the following equation, 
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rk = d- (n ■ r 2 ■ - ■ r k -i) l modp (8) 

Since the roots are randomly chosen from a uniform distribution in Z p , the 
probability of guessing r k without knowing the value of d is 1/p. Conversely, d 
cannot be estimated with a probability greater than 1/p without knowing the 
k th root r k . □ 

It follows from Theorem 1 that data is represented as a multiple of k numbers 
in the finite field. 

Example 1. Let data d = 10, prime p = 31, and let k = 3. We need to 
partition the data into three parts for which we will need to use a cubic equation, 
a; 3 + a 2 x 2 + a\x — d = mod p. 

We can find the equation satisfying the required properties using Theorem 1. 
Assume, (x— r\){x— r 2 )(x— r 3 ) = mod 31. We randomly choose 2 roots from the 
field, ri = 19 and r 2 = 22. Therefore, r 3 = d-^-r^- 1 = 10-(19-22)- 1 mod 31 = 
11. The equation becomes [x— r{){x— r 2 )(x — r 3 ) = x 3 — 2lx 2 +x — 10 = mod 31, 
where the coefficients are a\ = 1 and a 2 = —21 and the partitions are 11, 22 and 
19. 

Choosing the Coefficients: We described above two conditions that must 
be satisfied by the coefficients. The first condition was that not all the coefficients 
are simultaneously zero and second that the choice of coefficients should result in 
an equation with roots in Z p . Since no generalized method for solving equations 
of degree higher than 4 exists [21], a numerical method must be used which 
becomes impractical as the number of partitions grows. An easier method to 
compute the coefficients is exemplified by Theorem 1 and Example 1. 

One might ask why should wc want to compute the coefficients if we already 
have all the roots? We answer this question a little later. 

Introducing Redundancy: In situations when the data pieces stored on 
one or more (less than a threshold number of) sensors over the sensor network 
may not be accessible, then other sensors should be able to collaborate to recreate 
the data from the available pieces. The procedure outlined below extends the k 
partitions to n partitions such that only k of them need to brought together to 
recreate the data. If {n, r 2 , . . . , r^} is the original set of partitions then they can 
be mapped into a set of n partitions {pi,p 2 , . . . ,p n } by the use of a mapping 
function based on linear algebra. If we construct n linearly independent equations 
such that 

ann + ai 2 r 2 + . . . + a lk r k = c x 
a 2 in + a 22 r 2 + . . . + a 2k r k = c 2 

a nl ri + a n2 r 2 + . . . + a nk r k = c n 

where numbers randomly and uniformly chosen from the finite field Z p , 

then the n new partitions are pi = {an, ai 2 , . . . , a^, Cj}, 1 < i < n. The above 
linear equation can be written as matrix operation, 
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To recreate rj,l < j < k from the new partitions, any k of them can be brought 
together, 
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A feature of the presented scheme is that new partitions may be added and 
deleted without affecting any of the existing partitions. 

An alternate approach to partitioning: Once a sensor has computed all 
the k roots then it may compute the equation resulting from (4) and store one 
or all of the roots on different sensors and the coefficients on different sensors. 
Recreation of original data can now be performed in two ways: either using 
(5) or choosing one of the roots at random and retrieving the coefficients and 
substituting the appropriate values in the equation (3) to compute 

— a = x k + cik-ix k ~ 1 + a>k-2X k ~ 2 + ••• + a\x mod p (9) 

Therefore, the recreated data d = (p — ao) mod p. Parenthetically, this may 
provide a scheme for fault tolerance and partition verification. Additionally, a 
sensor may store just one of the roots and k — 1 coefficients on the network. 
These together represent k partitions. 

Note. Distinct sets of coefficients (for a given constant term in the equation) 
result in distinct sets of roots and vice versa. This is because two distinct sets 
of coefficients represent distinct polynomials because two polynomials are said 
to be equal if and only if they have the same coefficients. By the fundamental 
theorem of algebra, every polynomial has unique set of roots. 

Two sets of roots R\ and R2 are distinct if and only if 3ri\frj(ri ^ rj), 
where ri G R\ and rj G i?2- To compute the corresponding polynomials and 
the two sets coefficients C\ and C2, we perform n»=i neRi( x — r «) = modp 

and rij=i r eR 2 ( x ~ r o) = mod p and read the coefficients from the resulting 
polynomials, respectively. It is clear the at least one of the factors of the two 
polynomials is distinct because at least one of the roots is distinct; hence the 
resulting polynomials for a distinct set of roots are distinct. 

Theorem 2. Determining the coefficients of a polynomial of degree k > 2 in a 

/c-l 

finite field Z p , where p is prime, by brute force, requires computa- 
tions. 

Proof. If A represents the set of coefficients 

A = {ai, a 2 , afe^i}, where < < p— 1, then by the above note each distinct 



6 Abhishek Parakh and Subhash Kak 



instance of set A gives rise to a distinct set of roots R — {r 1; r 2 , r k }, and, 
conversely, every distinct instance of set R gives rise to a distinct set of coeffi- 
cients A. Therefore, if we fix d to a constant, then (5) can be used to compute 
the set of roots. Every distinct set of roots is therefore a k — 1 combination of a 
multiset [13, 14], where each element has infinite multiplicity, and equivalently 
a k — 1 combination of set £ = {0, 1, 2, ...,p — 1} with repetition allowed. Thus, 
the number of possibilities for the choices of coefficients is given by the following 
expression 

p \_fp-i + (k-iy 
k-i/ \ k-i 

_ (p+fc-2)! 



(p_l)!(fe-l)! 



_ (p+k-2)(p+k-3)...(p+k-k)(p-l)\ nn x 

— (p-l)!(fe-l)! ^ V > 

_ (p+k-2)(p+k-3)...p 
(fe-l)l 

- I (fc-l)! I 

Here we have used the fact that in practice p 3> k > 2, hence the result. We 
have ignored the one prohibited case of all coefficients being zero, which has no 
effect on our result. □ 



3 A stronger variation to the protocol modulo a composite 
number 

An additional layer of security may be added to the implementation by perform- 
ing computations modulo a composite number n = p ■ q, p and q are primes, and 
using an encryption exponent to encrypt the data before computing the roots of 
the equation. In such a variation, knowledge of all the roots and coefficients of 
the equation will not reveal any information about the data and the adversary 
will require to know the secret factors of n. For this we can use an equation such 
as the one below: 

fc-i 

x k + a k - iX k - 1 + d y = mod n (11) 
i=i 

where y is a secretly chosen exponent and GCD(y, n) = 1. If the coefficients are 
chosen such that (11) has k roots then 

k 

Y[ n = d y mod n (12) 

i=l 

Appropriate coefficients may be chosen in a manner similar to that described 
in previous sections. It is clear that compromise of all the roots and coefficients 
will at the most reveal c = d v mod n. In order to compute the original value of 
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d, the adversary will require the factors of n which are held secret by the sensor 
which owns the data. 



4 Addressing the Data Partitions 

The previous sections consider the security of the proposed scheme when prime 
p and composite n are public knowledge. However, there is nothing that compels 
the user to disclose the values of p and n. If we assume that p and n are secret 
values, then the partitions may be stored in the form of an "encrypted link list" , 
which is a list in which every pointer is in encrypted form and in order to find 
out which node the present node points to, a party needs to decrypt the pointer 
which can be done only if certain secret information is known. 

If we assume that p and n are public values then the pointer can be so 
encrypted that each decryption either leads to multiple addresses or depends on 
the knowledge of the factors or both. Only the legitimate party will know which 
of the multiple addresses is to be picked. 

Alternatively, one may use a random number generator and generate a ran- 
dom sequence of sensor IDs using a secret seed. One way to generate this seed 
may be to find the hash of the original data and use it to seed the generated 
sequence and keep the hash secret. 

5 Future Work and Other Applications 

Our approach leads to interesting research issues such as the optimal way to 
distribute the data pieces in a network of n sensors. Also in the case when the 
sensors are moving, one needs to investigate as to how the partitions need to be 
reallocated so that the original sensor is always able to access the pieces when 
needed. Further, questions of load balancing so that no one sensor is storing a 
very large number of partitions and the partitions are as evenly distributed over 
the network as possible, needs to be investigated. 

Yet another application of the presented scheme is in Internet voting pro- 
tocols. Internet voting is a challenge for cryptography because of its opposite 
requirements of confidentiality and verifiability. There is the further restriction 
of "fairness" that the intermediate election results must be kept secret. One of 
the ways to solve this problem is to use multiple layers of encryption such that 
the decryption key for each layer is available with a different authority. This 
obviously leaves open the question as to who is to be entrusted the encrypted 
votes. 

A more effective way to implement fairness would be to avoid encryption 
keys altogether and divide each cast ballot into k or more pieces such that each 
authority is given one of the pieces [22]. This solves the problem of entrusting 
any single authority with all the votes and if any of the authorities (less than 
the threshold) try to cheat by deleting or modifying some of the cast ballots, 
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then the votes may be recreated using the remaining partitions. Such a system 
implicitly provides a back-up for the votes. 

6 Conclusions 

We have introduced a new distributed data storage scheme for the sensor net- 
works. In this scheme data is partitioned in such a way that each partition is 
implicitly secure and does not need to be encrypted. Reconstruction of the data 
requires access to a threshold number of sensors that store the data partition. 

An additional variation to the scheme where the data partitions need to be 
brought together in a definite sequence may be devised. One way to accomplish 
it is by representing partition in the following manner: 

Pi)Pi(P2))P2(P3)j — ) where Pi(pj) represents the encryption of Pj by means of 
Pi. Such a scheme will increase the complexity of the brute- force decryption task 
for an adversary. 
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